Combining multiple evidence for gene symbol disambiguation
نویسندگان
چکیده
Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various types of information from available knowledge sources. Results showed that a combination of evidence usually improved performance. The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2% on a testing set of ambiguous human gene symbols.
منابع مشابه
Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection
Genes are discovered almost on a daily basis and new names have to be found. Although there are guidelines for gene nomenclature, the naming process is highly creative. Human genes are often named with a gene symbol and a longer, more descriptive term; the short form is very often an abbreviation of the long form. Abbreviations in biomedical language are highly ambiguous, i.e., one gene symbol ...
متن کاملImplementing Dynamic Minimal-prefix Tries
A modified trie-searching algorithm and corresponding data structure are introduced which permit rapid search of a dictionary for a symbol or a valid abbreviation. The dictionary-insertion algorithm automatically determines disambiguation points, where possible, for each symbol. The search operation will classify a symbol as one of the following unknown (i.e. not a valid symbol), ambiguous (i.e...
متن کاملGene symbol disambiguation using knowledge-based profiles
MOTIVATION The ambiguity of biomedical entities, particularly of gene symbols, is a big challenge for text-mining systems in the biomedical domain. Existing knowledge sources, such as Entrez Gene and the MEDLINE database, contain information concerning the characteristics of a particular gene that could be used to disambiguate gene symbols. RESULTS For each gene, we create a profile with diff...
متن کاملEstimation of Combining Ability and Gene Action for Agro-Morphological Characters of Rapeseed (Brassica Napus L.) Using Line×Tester Mating Design
Combining ability effects were estimated for different agronomic characters in line × tester crossing program comprising 21 hybrids produced by crossing 7 lines and 3 testers. Parents and hybrids differed significantly for general combining ability (GCA) and specific combining ability (SCA) effects, respectively. The variance due to GCA and SCA showed that gene action was predominantly additive...
متن کاملEstimation of Combining Ability and Gene Action for Agro-Morphological Characters of Rapeseed (Brassica Napus L.) Using Line×Tester Mating Design
Combining ability effects were estimated for different agronomic characters in line × tester crossing program comprising 21 hybrids produced by crossing 7 lines and 3 testers. Parents and hybrids differed significantly for general combining ability (GCA) and specific combining ability (SCA) effects, respectively. The variance due to GCA and SCA showed that gene action was predominantly additive...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007